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Abstract 

High-resolution data of online chats are studied as a physical system in laboratory in order to quantify collec- 
tive behavior of users. Our analysis reveals strong regularities characteristic to natural systems with additional 
features. In particular, we find self-organized dynamics with long-range correlations in user actions and persistent 
associations among users that have the properties of a social network. Furthermore, the evolution of the graph 
and its architecture with specific k-core structure are shown to be related with the type and the emotion arousal of 
exchanged messages. Partitioning of the graph by deletion of the links which carry high arousal messages exhibits 
critical fluctuations at the percolation threshold. 



* All data used in this study are fully anonymized. 
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I. INTRODUCTION 



Quantitative analysis of human collective dynamics has recently become available based on the high- 
resolution empirical data from online communication systems, which can be studied as complex dynam- 
ical systems in physics laboratory. The self-organized dynamics, common to the online interactions, is 
crucial for the emergence of collective behaviors of users iil-yl]. The kind of social structures built in the 
dynamics, however, may depend on the technology features of the communication platform and other 
details such as visibility of user-to-user messages. Such details may affect actions of individual users 
and thus influence the course of events. For instance, the dynamics of message exchange along "friend- 
ship" links in the social network MySpace has been shown [4(] to yield structures much different from 
the conventional social networks o-LZD. Apart from the online social networks, the online games 
jlogs, diggs, forums |3, uj, III etc, can also lead to recognizable user associations. In the online games 
8i|9|], for example, the users can mark friend/enemy relationship or undertake collective actions towards 
other users. On the other hand, indirect interactions on blogs provide hidden mechanisms in which the 
subjects of the posts and negative emotion (critique) dominate, leading to user communities centered 
around certain popular posts j^Hu]]. Furthermore, frequency of interactions, subject of the communica- 
tion, and amount of emotion conveyed in these messages may become critical for collective dynamics 
and for building specific associations among users over time. These aspects of the online communi- 
cation dynamics represent a major challenge for quantitative analysis and theoretical modeling. Ideal 
empirical systems where this can be studied are the IRC (Internet-Relay-Chat) channels, where no a 
priori relationship among users exists. 

In this paper we combine statistical physics with computer science methods of text analysis to perform a 
quantitative analysis of human collective behavior in online chats. A large dataset from IRC Ubuntu 
channel is analysed considering user-to-user communications with full text of messages and assessing 
emotion contents in the text. In addition to humans, the data contains a Web robot (bot), which serves 
predefined text messages upon users' request. By mapping the data onto directed weighted network, 
and analyzing its topology in connection with temporal patterns of user actions, we find that a new type 
of techno-social structure emerges, which persists for a long time. We further analyze its architecture by 
testing the validity of the "social" hypothesis and study the percolation transition that is connected with 
the amount of emotional arousal on the links. Both the graph architecture and its social ties suggest that 
a specific type of online social network is assembled based on the emotion-carrying communications. 
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Data structure. We consider data from the publicly accessible IRC channel related to the development of 



the Ubuntu operating system [12]. The data contain both humans and a bot. Users, identified by IDs, 
typically exchange short text messages in seeking information on software or services, joining group 
discussions, or conducting open-domain chats. The data considered in this study are for one year period 
(year 2009), and contain texts of messages and time resolution of one minute. After retrieval of all the 
logs for a given time window, anonymization by substituting user IDs by random number references, 
and removing the spam, the utterances are annotated with appropriate sentiment detecting tools. In 
particular, in this study we use the lexicon-based annotation, by which emotion carrying words in a 



message are detected 11311 . Such words carry emotional arousal — degree of reactivity, and valence 



pleasure or displeasure, listed in the range (1—9) in the emotional dictionary 01311 . from which then 
the average arousal and valence are computed for the whole message. Moreover, the content of each 
message is determined according to one (or more) dialog act classes. In total, twelve such classes can 



be determined 11411 . For instance, three classes that we consider here (yes-no question, why-question, 



statement), in common "question"-type messages, appear to be characteristic to a certain group of users. 



II. SELF-ORGANIZED DYNAMICS OF CHATS AND NETWORK EVOLUTION 



The data are filtered to identify unique user IDs (the ID labeled by "35177" belonging to the bot is 
treated equally). The subset where user-to-user communications are clearly identified is selected and 
mapped onto a directed network. The network nodes i = 1,2 - --N represent users and the directed link 
i — > j indicates that at least one message from user i to user j occurred within the considered time 
window. Multiple messages along the link increase its width, while the message emotion and message 
category are considered as properties of that link. Starting from the beginning of the dataset, the network 
evolves in time by addition of new users and new links and increasing the widths of the existing links. 
Fig. Q] shows how the network grows within first three days. After one year (data limit) the network 
consists of N = 85185 users with the link density p = L/N(N — 1) = 1.244 x 10~ 4 and link reciprocity 
r = ir r jL = 0.284. 

In the temporal patterns of user actions we noticed that some users appear for a short time and disappear 
from the channel, while others are constantly or occasionally active within a long time period. We 
examine the lifetime of links that the users establish between each other. The distribution of the lifetimes 
of all links in the dataset is shown in Fig. [2] The three curves are, respectively, for all kinds of links, 
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FIG. 1 : Daily growth of the directed network of chats by addition of new users (time series in the inset) 
and by addition of links and increase of the links intensity. 



and for the links that carry an overall positive and negative emotion valence. Apart from the frequency, 
the patterns are similar: For lifetimes shorter than one day, the distribution decays faster than for the 
lifetimes longer than one day. The power-law with a small exponent of the distribution in the latter 
region suggests that the links that survive the first day after the appearance are likely to persist for 
a longer time (till data limit). The network that contains only such persistent links, here termed the 
persistent network, or UbuNetP, is of our interest and can be considered as a potential social structure. 
Thus, we remove the links which do not survive over the first day after their appearance. Note that in 
this way a number of users are also removed. (The user survival distribution follows a similar pattern, 
not shown). In the following we explore the emergent network with the persistent links and study how 
the above mentioned link properties, in particular the message types and their emotion contents, are 
related with the network architecture. 

These network connections emerge through a self-organized dynamics, in which users play different 
functions and over time they "settle" in the evolving environment. The cooperative behavior is quantified 
by analysis of correlated fluctuations in the time series of user actions. Specifically, we construct the 
time series of the number of messages per one minute time bin. In Fig. [3] an example of the time 
series is shown in the bottom panel. For better view, only a small part corresponding to eight days 
is shown, while the length of the considered time series is T = 262144 bins. The power spectrum of 
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FIG. 2: Distribution of the lifetime of links with positive and negative overall emotion messages, and 

the links with all types of messages. 
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FIG. 3: (bottom) Time series of the number of messages and (top) its power spectrum. Inset: Largest 
fluctuations of the integrated time series scaled by its standard dispersion plotted against time interval. 

the time series is given in the top panel in Fig. [3] The inset shows the fluctuations R t /o t ~ * > scaled 
by the standard deviation a t , of the integrated time series Y t = J^. =1 [x(t)— < x(%) >] as a function of 
varying time window t = 1,2,- -T. The slope determines the Hurst exponent H. Apart from the peak 
corresponding to daily periodicity, the spectrum shows long-range correlations of the Gaussian noise 
type with S(v) ~ l/v^ in a wide range of frequency index v, which is indicated by the straight line. 
The correlation range, considered in the time domain, agrees well with the persistence time window 
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of the fluctuation, i.e., t G [5,32000]. Note also that the values of the exponents = 0.48 ±0.03 and 
H = 0.72 ± 0.06 satisfy (within error bars) the scaling relation = 2H — 1 expected for the colored 
noise signals. Furthermore, the value of the Hurst exponent H > 1/2 suggests that a persistent type of 
fluctuations occurs in the system as a whole. T Note that the Hurst exponent in this range is also found 
for the time series of messages carrying emotional content along a particular link or by a particular user 



[15 



161. 



III. CONTENT-BASED STRUCTURE AND RESILIENCE OF THE SOCIAL GRAPH 

The network with persistent links is further examined as a "social" structure. We recall that "weak- 
ties" hypothesis has recently been confirmed in the data of mobile phone, online games, and MySpace 
dialogs networks Q4J, l8L l2fc LllZtl - In analogy to the social networks, the test is performed by computing 
the overlap O — the number of common neighbors of two nodes which share a link, as a function of the 
width W and of the betweenness centrality B of that link. In particular, the overlap increases with the 
intensity of communications W between the nodes as OiW) ~ W 1 , while it decreases with the centrality 
of the links as O(B) ~ B~^ 2 . Both laws hold in the above mentioned online systems, all of which have 
a typical community structure. The corresponding exponents are close to the universal values T]i = 1/3 
and T72 = 1/2, which are recently conjectured in |9|]. 

In the emergent networks of chats, the situation is somewhat different in that no classical community 
structure is found. Moreover, the network shows hierarchical organization of links with a large "core", 
but the links between nodes at different hierarchy levels also occur. Thus the network remains con- 
nected even if the central core is removed 11811 . Consequently, we find that the overlap has a different 
dependence of betweenness centrality, as shown in Fig. 01 The inset shows the overlap as function of W. 
The three curves are for our persistent network UbuNetP, and for two subnetworks described below. 
One subnetwork is obtained by removing the bot and all its links. The other is a reduced network shown 
in the top Fig. HJ whose structure reflects the type of messages that the users exchange: the outer layer 
(blue nodes) represents the users who send question-type of messages twice more often than the aver- 
age frequency on the channel, and no other message types. They are attached to the central core (red 
nodes), which consists of the bot and the users who frequently use all types of messages. Qualitatively, 
the "social ties" behaviors are present, but the corresponding exponents are twice larger, T71 « 0.66 and 
772 ~ 1, except for a small parts of the curves (marked by the red dashed line), which agree with the 
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FIG. 4: Top: Network with central core nodes (red) and users whose activity is dominated by 
question-type messages (blue). Bottom: The overlap O of the adjacent nodes plotted against the 
betweenness centrality B, main figure, and the weight W of their link, inset, for the persistent network 
UbuNetP and two its subgraphs, described in the text. 



other online social systems. The origin of such behavior as well as the sudden increase of the overlap 
at high B values needs theoretical modeling and analysis of more similar systems, which is left out of 
the present work. It is interesting to note that qualitatively same behavior is found in the network from 
which the bot is removed. 

The chat network has a structure with a power-law distribution for in- and out-degree with a similar 
exponent z q = 2.0 ±0.02 and disassortative mixing. A detailed analysis is left for a separate study 11511 . 
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FIG. 5: Relative size of the giant cluster S max /N and the susceptibility % plotted against rescaled 
threshold arousal of the links a. Inset: S max /N against distance from the percolation point p K — p\, 

Ke (a,p,q). 

Hereafter we focus on the resilience of the persistent chats network by analyzing the percolation transi- 
tion that occurs in relation to the arousal of the messages carried along the directed links. The rationale 
is that the links persist over a long time due to certain attribute (or "excitement", properly quantified 
by the emotional arousal) of the messages exchanged along them. It is worth noting that the weights 
and the averaged arousal of the links obey a power-law and a skewed Gaussian distribution, respectively 



11511 . Starting from the whole persistent network (with N = 6168 users and L = 33838 links) we cut the 
directed links whose average arousal exceeds a threshold value, i.e., aij/a max > a. For convenience it 
is normalized by a max , the largest average arousal found on the network. Note that, depending on the 
actual value of the threshold a, the number of links that satisfy the condition as well as their position on 
the network can vary considerably. Together with the network topology, directedness, and reciprocity 
of the links, this fact of the content-based topology determines the features of the percolation transition 
Jl^l . For each threshold a the largest cluster (as weakly connected component) is found and its size 
Smax/N is plotted against the threshold a in Fig. [5] Also shown is the susceptibility %, which is defined 
as X = L.s<s s n s . It determines the fluctuations of the cluster size in response to cutting the links with 
a given arousal. The distribution of cluster size satisfies the power law n s ~ s -2-6±0.08 j n a g reemen t 
with the percolation theory, % exhibits a peak at the transition point a* = 0.561, where the S max /N tends 
to vanish. 
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What kind of geometry and/or complexity change 1119142211 in the network structure occurs at the transi- 



tion point a*? We examine the fraction of retained links p, and the average degree q of the giant cluster, 
for the range of values of a > a* above the transition point. In the inset of Fig. [5] we show the scaling 
region S max /N ~ (p K — p* K )^ , where the index indicates K = (a,p,q). The critical value p* = 0.015 and 
q* = 1.944. The exponent j8 is close to one, i.e., j8 = 1.06±0.04, 1.00±0.05, 1.0±0.1 for K = a, p,q, 
respectively. 



IV. CONCLUSION 



Applying physics methods we have studied the self-organized dynamics of user-to-user interactions 
in the online chats, and have shown that both the type of the messages and their emotional arousal 
play an essential role in building a persistent techno-social structure. The social graph is hierarchically 
organized having a central core with some very active users and the Web bot. In contrast to more familiar 
user grouping into communities, this kind of spontaneous organization marks a new class of structures 
in the zoo of emergent online communication networks. The observed percolation transition on this 
graph is smooth with the enhanced fluctuations in response to a targeted deletion of links which carry 
the emotional arousal over a threshold value. Our analysis reveals that the contents of the exchanged 
messages and the role that users assume in the dynamics need to be considered as an integral parts of 
the topology in the online social graphs. In general, the idea to consider features beyond conventional 
interactions, that we pursued here for the analysis of human online interactions, can also be useful for 
studying physical systems which are assembled from complex molecules or other objects in laboratory. 
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